Natalie Smith
  • About
  • Portfolio
  • Music

On this page

  • 1 Exploring Air Quality Trends
  • 2 Final Infographic
  • 3 Building the Infographic
    • 3.1 Finding Inspiration
    • 3.2 R Setup, Data, and Wrangling
    • 3.3 Mapping PM 2.5 in Los Angeles
    • 3.4 Identifying Pollution Sources
    • 3.5 Visualizing Pollution Over Time
    • 3.6 Putting it All Together in Affinity Designer
    • 3.7 Design Elements
  • 4 Takeaways
  • 5 Explore full code

Tracking PM2.5 Trends in Los Angeles

Trends
An overview of how I created a data-driven infographic on PM2.5 levels in Los Angeles (2010–2024), from data analysis in R Studio to design refinement in Affinity Designer
Author

Natalie Smith

Published

March 5, 2025

As a long-time Angeleno, I’m no stranger to hazy days and pollution soaked sunsets, but this past December felt different. The air was thick, smothering the city in a gray blanket that lingered for weeks. I found myself wondering—was the pollution actually worse, or was I simply more aware of it this time? Curious about whether this perception was backed by data, I decided to take a deeper dive into the air quality trends in Los Angeles.

Downtown L.A.’s skyline obscured by smog in December 2024. (Getty Images)

Downtown L.A.’s skyline obscured by smog in December 2024. (Getty Images)

1 Exploring Air Quality Trends

I began by exploring air quality datasets from the EPA, focusing on the annual median Air Quality Index (AQI) for Los Angeles County. While reviewing long-term AQI trends, I became curious about the specific pollutants driving poor air quality. By calculating the average frequency of each pollutant since 2000, I found that PM2.5 was the dominant contributor, responsible for 49% of air pollution. This discovery led me to ask a new set of questions: Where is PM2.5 most concentrated? What are its major sources? And how has it changed over time?

2 Final Infographic

Infographic showing PM 2.5 pollution trends in the greater Los Angeles.

Infographic showing PM 2.5 pollution trends in the greater Los Angeles.

This infographic consists of three separate plots that I created in R and then combined using Affinity Designer. Below, I outline the process I used to develop this graphic.

3 Building the Infographic

3.1 Finding Inspiration

I began by gathering inspiration and experimenting with various plot styles. A photo of a smoggy LA skyline inspired my color choices. I used a color grabber tool to extract a palette from the image, then adjusted it for accessibility to ensure it was readable for all viewers. For typography, I selected Montserrat and Open Sans, which offer a balance of professionalism and readability without detracting from the visuals. For more about the general design and design elements jump to the ‘design elements’ section.

Creating a vision board in Affinitiy Designer for the infographic using an LA skyline photo. The color palette was extracted from the image, with typography options laid out alongside.

3.2 R Setup, Data, and Wrangling

In the next phase of the project, I set up my environment in R Studio by loading the necessary libraries, setting up my color palette, importing custom fonts, and loading the data from CalEnviroscreen and the EPA. A significant amount of data wrangling was required for each visualization, and throughout this blog post, you can click and expand the code chunks to view more details about the process. Links to the data sources and the specific code used for wrangling will be provided in the following sections.

Code
# The following R code sets up the environment by importing necessary libraries, cleaning data, and defining custom colors for the map visualization.

#libraries: 
library(tidyverse)   
library(janitor)     
library(lubridate)
library(here)        
library(doBy)      
library(scales)
library(showtext) 
library(glue)
library(ggtext)
library(sf)
library(here)

#......................import Google fonts.......................
# `name` is the name of the font as it appears in Google Fonts
# `family` is the user-specified id that you'll use to apply a font in your ggpplot
font_add_google(name = "Montserrat", family = "mont")
font_add_google(name = "Open Sans", family = "open_sans")



# turn show text on
showtext_auto()

# option 1
smog_pal <- c("#79AAB6", 
              "#B0C7C1", 
              "#E3AC79", 
              "#CC7E62",
              "#B77B70")


# option 2
smog_pal2 <- c("#79AAB6", 
              "#B0C7C1", 
              "#DFAF75", 
              "#DE8635",
              "#DF674F")


# option 3
smog_sub_pal <- smog_pal2[c(4,5)]

3.3 Mapping PM 2.5 in Los Angeles

To answer where is PM2.5 distribution across Los Angeles, I used CalEnviroScreen (2023) data, mapping percentiles at the census tract level. Percentiles rank each tract’s pollution concentration relative to all others in California, helping identify high-exposure areas. The analysis also revealed that areas with high PM2.5 pollution often overlap with neighborhoods facing significant poverty, highlighting the intersection of environmental and socioeconomic disparities.

Code
 #| eval: true
 #| echo: false
 #| message: false
 #| warning: false

# bring in enviroscreen shapefile
    enviroscreen_sf <- read_sf(here("portfolio/pm2_5_la/data/enviroscreen_shapefiles/CES4_final_shapefile.shp")) %>% 
      clean_names() 


#-------- Tidy Data-------------

# Define excluded locations
    excluded_locations <- c("Santa Clarita", "Palmdale", "Lancaster", "Acton", 
                           "Agua Dulce", "Altadena", "Lake Los Angeles",
                           "Leona Valley", "La Crescenta-Montrose")

    excluded_tracts <- c("6037911001", "6037910002", "6037403325", "6037104124",
                         "6037920326", "6037930301", "6037910709", "6037920303")

    excluded_zips <- c("90265", "93535", "93552", "93532", "90704",
                       "91384", "91387", "91390", "93510", "93536", "91351",
                       "91011", "91355", "93551", "91342", "91381")


# Tidy data 
    enviroscreen_sf <- enviroscreen_sf %>% 
      filter(county == "Los Angeles") %>% 
      select(tract, 
             zip,
             approx_loc,
             pm2_5_p,
             geometry,  
             county) %>% 
      filter(!approx_loc %in% excluded_locations) %>%
      filter(!tract %in% excluded_tracts) %>%
      filter(!zip %in% excluded_zips)

A choropleth map visualizing PM2.5 pollution levels across Los Angeles census tracts.Darker shades indicate higher pollution percentiles relative to all of California.

Figure 1: Spatial distribution of PM2.5 pollution in Los Angeles. Census tracts are shaded based on their PM2.5 percentiles relative to all of California, with darker shades indicating higher pollution levels. Even neighborhoods with lower pollution still experience 50% more pollution compared to the rest of California. Areas in the 90th percentile include Reseda, Van Nuys, and Central Los Angeles.

3.4 Identifying Pollution Sources

To answer the question of major sources of PM2.5, I analyzed the EPA’s 2020 National Emissions Inventory (NEI), which tracks air pollution from both point and nonpoint sources. Point sources are single, identifiable emitters like power plants and factories, while nonpoint sources are more diffuse, stemming from widespread activities such as residential heating and vehicle emissions.

I combined these datasets and categorized them by source type, ranking the top emitters by total emissions. The final visualization used a horizontal bar chart to compare pollution sources, with an annotation marking the 100-ton threshold for major contributors. I explored putting the tons emitted at the end of each bar to eliminate the need for the y-axis, but I didn’t like the data-to-ink ratio.

A horizontal bar chart illustrating the top sources of PM2.5 pollution in Los Angeles for 2020, categorized into point and non-point sources. The x-axis displays the various pollution sources, ordered by total emissions, while the y-axis represents emissions in tons. Key sources of emissions include Industrial Processes, Residential Wood Combustion, and Light Duty Vehicles. A dashed horizontal line at 100 tons marks the EPA’s threshold for a major pollution source, with an annotation highlighting this threshold and an arrow pointing to the line.

Figure 2: Top 10 sources of PM2.5 pollution in Los Angeles in 2020, categorized into point and non-point sources. Point sources refer to specific, stationary facilities such as petroleum refineries, while non-point sources are more diffuse and widespread, including Industrial Processes, Residential Wood Combustion, and Light Duty Vehicles. The dashed horizontal line indicates the EPA’s threshold for major sources, defined as those emitting more than 100 tons of PM2.5 annually.

3.5 Visualizing Pollution Over Time

My final visualization aimed to answer how PM2.5 levels have changed over time. I analyzed the EPA’s Outdoor Air Quality dataset, calculating annual mean concentrations from 2010 onward. The resulting line plot showed trends over the past decade, with black points representing yearly averages and a red point emphasizing a 2020 spike caused by the Bobcat Fire. To draw attention to this anomaly, I added an annotation and an arrow pointing to the data point.

Line plot showing annual mean PM2.5 levels in Los Angeles from 2010 to 2024. A decline in pollution is visible, but levels remain above 9 µg/m³, the threshold set by the EPA for the safe annual average concentration of PM2.5 for public health. A red point highlights the year 2020, with a spike in PM2.5 due to the Bobcat Fire. An arrow points to the 2020 spike, and a text annotation explains the impact of wildfires on air quality.

Figure 3: Trends in PM2.5 levels in Los Angeles from 2010 to 2024, showing a consistent decline in pollution, although levels remain above the 9 µg/m³ threshold, placing LA in the moderate pollution category. As noted by a red point, there is a spike in PM 2.5 in 2020, presumably due to the Bobcat Fire.

3.6 Putting it All Together in Affinity Designer

After generating all my visualizations, I exported them from R as PDFs and imported them into Affinity Designer. As a vector-based tool, Affinity allowed for precise adjustments to the size, colors, and intricate details of the graphics. I replaced graph titles, subtitles, and legends with annotations to streamline the final look and improve the data-to-ink ratio. Additionally, I included a hand-drawn illustration comparing PM2.5 particles to the width of a human hair, to give context and scale to the size of these pollutants.

3.7 Design Elements

When creating an infographic, it’s important to carefully consider various design elements. My goal was for the visualizations to connect and convey a cohesive story. In this section, I’ll walk you through my thought process and the reasoning behind my choices for the design elements listed below.

  • Graphic Form
  • Text
  • Themes
  • Colors
  • Typography
  • General Design
  • Context
  • Message
  • Accessibility
  • DEI

While I experimented with several plot types, I ultimately chose a choropleth map, a bar graph, and a line plot. My goal was to use a variety of shapes to convey dynamic movement between the plots, effectively telling the story of the pollutant. I also experimented with displaying the overall AQI and all the pollutants in the AQI as bubbles, but this approach took attention away from the central focus on PM2.5 pollution in Los Angeles.

To create consistency across my visualizations, you’ll notice that each one includes titles, subtitles, and captions. I also minimized the use of axis titles where they weren’t absolutely necessary. In the final infographic, I moved away from traditional titles and subtitles, opting instead for annotations and colors to provide context and guide the reader through the story. Additionally, I used annotations in both the standalone graphs and the final infographic to emphasize key points, such as the 100-ton threshold for major pollution sources in the bar graph.

My general aesthetic preference is quite minimal, so I chose to keep the plot themes simple to allow the bright colors to stand out. This involved removing legends, axis text, axis lines, and background grids, as previously mentioned. Most of these elements weren’t essential for conveying meaning and could be effectively replaced with annotations in the final graphic.

I had a lot of fun experimenting with colors! As mentioned earlier, I chose a photograph of a smoggy downtown Los Angeles, which features distinct layers of smog and sky. Using the color picker in Affinity, I extracted the colors from the image. I then checked them for colorblind accessibility and tested them on a grayscale to ensure they remained distinguishable. To further improve accessibility, I adjusted the saturation and opacity. I’m really happy with the palette I ended up with—it transitions from a cool blue to a bright terracotta, capturing the sky in LA. The cool-to-warm gradient also helps illustrate the progression of pollution levels, from low to high.

I used two fonts throughout my infographic to ensure consistency and readability. Montserrat was reserved for the main title, while Open Sans was used for all other text. Both are sans-serif fonts, chosen for their clarity and modern aesthetic—a subtle contrast to the theme of polluted air. To enhance readability and emphasize important details, I used bold text, often paired with color, to highlight key points in annotations and draw attention to critical statistics.

I designed the infographic to guide the viewer’s eyes in a natural reading flow—moving from left to right and then down, similar to reading a book. However, I was also mindful that not everyone follows the same reading pattern, so I ensured that each graph could stand alone and be understood in any order. That said, I placed the illustration of PM2.5 and its context right at the top, just below the title, as I felt it was crucial for understanding the rest of the infographic.

I spent a lot of time refining the story to ensure it naturally guides the viewer toward key insights while still allowing for personal interpretation. However, I did provide context through the header, the PM2.5 illustration, and concise plot annotations. Instead of over-annotating with additional explanations or takeaways, I used select highlights to guide the reader and let the data speak for itself.

While this project initially started as an exploration without a specific message in mind, a clear takeaway emerged: PM2.5 pollution in Los Angeles is a significant issue, with distinct spatial and temporal patterns. The infographic serves more as an introductory overview (think — PM2.5 Pollution 101) rather than an in-depth analysis, providing viewers with a foundational understanding of the issue.

As mentioned earlier, I created my own color palette based on a photo of smog. Some of the colors were too similar, which could cause accessibility issues for viewers with color blindness or in grayscale. To address this, I adjusted the saturation of certain colors to create more contrast. I also made sure to avoid placing colors that were too similar next to each other, unless they were part of a gradient. Additionally, I added alt text to all of my visuals to further ensure accessibility for all viewers.

I knew I wanted to incorporate a DEI aspect into my choropleth map of PM2.5 pollution in Los Angeles. When I noticed pollution hotspots in Reseda and Central Los Angeles, I decided to check if these areas also ranked high on the CalEnviroScreen 4.0 percentile range. The CalEnviroScreen 4.0 percentile range is a tool used to assess the cumulative environmental, health, and socioeconomic impacts in California. Areas are ranked based on factors like water quality, proximity to hazardous waste sites, and poverty levels. Both Reseda and Central Los Angeles ranked very high, indicating they had high cumulative impacts, including both high PM2.5 levels and significant socioeconomic stressors. To represent this, I initially created a bivariate map. However, I faced challenges trying to explain what the CalEnviroScreen percentile range meant in the context of an infographic. To make this easier to interpret, I simplified the map by focusing solely on PM2.5 pollution and poverty. Despite this, I still found it difficult to display the data in a clear and understandable way, as I would need to explain each of the combinations (high pollution, high poverty; low pollution, high poverty, etc.) within the map for it to make sense to viewers. Since the bivariate map and the pollution distribution map looked very similar, I decided to show just the distribution of PM2.5 pollution in Los Angeles, and note the poverty aspect into the annotations. I think this made it easier to understand, but still hits on the major environmental justice issue at play.

4 Takeaways

So was the air quality really worse that December day? The data shows a nuanced story. While PM2.5 levels have declined over the past decade, Los Angeles still exceeds the EPA’s recommended threshold of 9 µg/m³. The December 2024 haze wasn’t an anomaly, but a visible reminder of our ongoing air quality challenges.

My analysis revealed several key insights:

  • Spatial inequality persists: Pollution disproportionately affects certain neighborhoods, often overlapping with areas facing socioeconomic challenges.

  • Pollution stems from diverse sources: Industrial activities, residential heating, and vehicles all contribute to PM2.5 pollution, creating a complex challenge especially in Los Angeles where the basin’s geography concentrates and traps pollutants.

  • Progress is happening, but challenges remain: While environmental regulations are improving air quality, pollution levels continue to exceed healthy standards. Wildfire events, which release significant amounts of pollutants, highlight our ongoing vulnerability to climate-related threats to air quality.

Reflecting on that smoggy December day, I realize my perception wasn’t wrong—LA’s air quality remains a challenge despite improvements. What has changed is my awareness of what we’re breathing. For fellow Angelenos, I offer reassurance of progress, along with a reminder that the work for cleaner air is far from over. So the next time you see that hazy skyline, remember it reflects both our progress and the work still ahead.

5 Explore full code

If you want to explore the full code, I’ve included it below for reference:

  • Copyright 2026, Natalie Smith
 
  • Built with Quarto